The Google DeepMind team, in collaboration with academic institutions, has developed an innovative method called Generative Reward Model (GenRM) aimed at improving the accuracy and reliability of generative AI in reasoning tasks. GenRM incorporates a validation process into text generation tasks, allowing the model to simultaneously generate and evaluate potential solutions while supporting Chain of Thought (CoT), enhancing the comprehensiveness of the validation process. Compared to traditional methods, GenRM has shown significant advantages in multiple tests, with accuracy improvements ranging from 16% to 64%, particularly in...